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DETAILED ACTION 

Continued Examination Under 37 CFR 1.114 

1 . A request for continued examination under 37 CFR 1.114, including the fee set forth in 
37 CFR 1.17(e), was filed in this application after final rejection. Since this application is 
eUgible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) 
has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 
37 CFR 1.1 14. Applicant's submission filed on 1 1/21/07 has been entered. 

Response to Amendment 

2. In response to the Office Action mailed 9I1\IQ1, Applicants have submitted an 
Amendment, filed 1 1/21/07, amending claims 1, 2, 8, 9, 13, 14, 20, 21 and 25, without adding 
new matter, and arguing to traverse claim rejections. 

Response to Arguments 

3. Applicant's argimients with respect to claims 1-3, 8-10, 13-15, 20-22, 25 and 26 have 
been considered but are moot in view of the new ground(s) of rejection, below. 



Claim Rejections - 35 USC § 112 

4. Claims 9, 21 and 25 have been amended and these changes are acceptable. Thus the 
rejections have been withdrawn. 



Application/Control Number: 1 0/688,04 1 Page 3 

Art Unit: 2626 

Claim Objections 

5. Claims 1, 2, 8, 9, 13, 14, 20 and 25 have been amended and these changes are acceptable. 
Thus the objections have been withdrawn. 

Claim Rejections - 35 USC § 103 

6. The following is a quotation of 35 U.S. C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 
102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the 
subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary 
skill in the art to which said subject matter pertains. Patentability shall not be negatived by the manner in which the 
invention was made. 

7. Claims 1-3. 8-10. 13-15. 20-22. 25 and 26 are rejected under 35 U.S.C. 103(a) as being 

unpatentable over Yamazaki (US Patent 5,864,814), in view of Miyatake et al. (hereinafter 
"Miyatake," US Patent 5,842,167) and Campbell et al. (hereinafter "Campbell," US Patent 
6,366,883), and further in view of Heina. Jr. (US Patent 7,043,433). 

Regarding claims 1.13 and 25 . Yamazaki teaches a system, machine-readable storage, 
and computer-implemented method for debugging and tuning synthesized audio (col. 22, line 28 
- col. 26, line 28; Figs. 26-33), comprising the steps of: (c) displaying a waveform corresponding 
to the synthesized audio generated from concatenated phonetic units; and (e) displaying an 
original recording containing a selected phonetic unit; (col. 23, 11. 17-24, teaches "original 
waveform display window. . .synthesized waveform display window"); and (d) displaying 
parameters corresponding to at least one of the phonetic units (col. 23, 11. 41-49, "correlation 
between the parameters on the time axis"; col. 23, line 66 - col. 24, line 43, "pitch pattern 
Wl . . .pitch pattern W2 of the synthesized waveform. . .pitch label"; col. 25, 11. 17-32, "velocity 
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indicating a volume; col. 23, II. 41-49, "correlation between the parameters on the time axis"; 
col. 25, II. 61-67, "change of velocity... change of pitch. . .manual operation"; col. 26, II. 23-28, 
"parameter"; Fig. 33 illustrates the pitch/velocity for a certain phoneme, which provides a visual 
indication of the pitch/velocity values (displaying parameters)). 

Yamazaki teaches receiving voice-generating information (Abstract, col. 10, II. 15-23). 
Yamazaki does not explicitly teach, but Miyatake teaches, (a) receiving a user-supplied text with 
a visual user interface; and (b) generating synthesized audio generated from concatenated 
phonetic units, the synthesized audio being a voice rendering of the user-supplied text; (col. 3, 11. 
6-26, teaches "FIG. 1 . . .inputting means. . .for inputting text data, a command and hand-written 
characters. . .morpheme analyzing means 2. . .divide the text data into minimum language units 
each having the meaning. . .speech language processing means 4 determines synthesis units 
which are suitable for producing a sound from text data thereby to generate prosodic data"; 
Abstract, "synthesizing speech from text data... text data displayed on a screen" and Figs. 1-4). 
It would have been obvious for one of ordinary skill in the art at the time the invention was made 
to receive textual input and generate from the input synthesized audio as in Miyatake because 
text to speech synthesizing technology is readily available, and as Miyatake teaches in col. 2, 11. 
32-34, this method also allows for operation in response to receiving hand-written text data. 

Yamazaki does not teach (d) the parameters including configuration parameters 
comprising at least one weight for adjusting at least one search cost function, the at least one 
weight comprising at least one of a pitch cost weight and a duration cost weight. However, 
Campbell teaches search cost ftinction weights, at least one weight for adjusting at least one 
search cost fiinction, the at least one weight comprising at least one of a pitch cost weight and a 
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duration cost weight (col. 9, line 58 - col. 10, line 11, teaches, "optimal weighting 
coefficient... [e]mpIoyed in this case are phonemic features such as intonation position and 
intonation mode as well as prosodic feature parameters such as. ..phoneme duration. . ."; col. 15, 
11. 28-36, teaches "weighting coefficient. . .target sub-costs in order to select out a speech sample 
that is the closest when the acoustic distances of the target speech unit, if possible, could be 
directly determined"; Fig. 3). It would have been obvious for one of ordinary skill in the art at 
the time the invention was made to modify the teaching elements of Yamazaki and Miyatake 
with Campbell because Campbell teaches an advantage is that the speech segments of the speech 
waveform signals in the speech waveform database can be directly utilized (col. 15, 11. 33-36). 

Yamazaki teaches: (f) receiving an editing input from the user (col. 25, line 61 - col. 26, 
line 16, "change of velocity. . .change of pitch. . .manual operation"); and (g) adjusting at least 
one configuration parameter in accordance with the editing input wherein adjusting includes 
repositioning a phonetic alignment marker (col. 23, 11. 29-40, teaches in "step SI 04, to set a 
duration length of each phoneme in relation to the original waveform displayed on the original 
waveform display window 25B, labels [phonetic alignment markers] each separating phonemes 
from each other along the direction of a time axis are given through a manual operation"; col. 26, 
11. 3-5, teaches, "If change of a label is requested (step SI 14), system control returns to step 
SI 04, and the label [phonetic alignment marker] is changed [repositioned] through a manual 
operation"; see Fig. 24, elements S104 and SI 14). 

Yamazaki teaches wherein the at least one configuration parameter is stored in a 
configuration file (Fig. 33; col. 25, 11.40-52, "new filing"; col. 25, line 61 - col. 26, line 28, 
"change of velocity. . .change of pitch. . .for each label"; col. 26, 11. 29-53, "change of parameters 



Application/Control Number: 1 0/688,04 1 Page 6 

Art Unit: 2626 

in. . .original synthesized waveform. . .object for editing"; col. 23, 11. 1-16, "making new voice- 
generating information"). 

Yamazaki fails to teach, but Miyatake teaches, wherein the at least one configuration 
parameter is stored in a text-to-speech engine configuration file (col. 1, 11. 55-62, "receives text 
data and edition data. . .synthesizes speech corresponding to the character data"; col. 4, 11. 20-44, 
"pauses are created"; Figs. 3-4; Miyatake teaches a visual speech synthesis editing system, like 
Yamazaki, and where it is used in a text-to-speech environment). Therefore, it would have been 
obvious to modify the teaching elements of Yamazaki with Miyatake in order to use editing data 
to produce a natural sounding spoken output from text, as described by Miyatake (col. 1, 11. 55- 
62 and 11. 20-30). 

Yamazaki, Miyatake and Campbell do not explicitly teach, but Hejna, Jr. suggests: (h) 
highlighting in the display of the original recording at least one user-selected phonetic unit (Fig. 
25, 11. 15-66, teaches, "produce a two dimensional graph. . .time and possibly text or phonetic 
words, displayed on a horizontal axis. . .displays a two dimensional representation of the input 
MW (the input audio or audio- visual work) with text or phonetic labels. . .well known. . .phonetic 
information. . .displayed as an overlay on top of a graphical representation of a speech 
waveform. . .Audience member (user) can highlight regions of the text displayed on Graphical 
Display. . .to identify specific portions of the input MW that are associated with the highlighted 
text"; see also Figs. 5 and 6, particularly element 6100). It would have been obvious for one of 
ordinary skill in the art at the time the invention was made to include highlighting in the display 
of the original recording at least one user-selected phonetic unit, as in Hejna, Jr., so that the user 
can easily keep track of his or her selection visually. 
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Yamazaki teaches: (i) correcting elements of a text-to-speech segment dataset of 
parameters corresponding to a segment of the synthesized audio identified to be problematic (col. 
25, line 61 - col. 26, line 16, teaches operations for changing [correcting] a velocity, pitch, 
phoneme, label, and/or voice tone setting); 

(j) generating a new synthesized waveform corresponding to one or more adjusted 
parameters (see for example. Figs. 32 and 33, particularly elements 25E, El, El 1 and E12, which 
illustrate new synthesized waveforms corresponding to a velocity adjustment, as described in col. 
25,11. 18-39 and 61-67); and 

(k) repeating steps (b) - (j) until a desired synthesized output is generated (see Fig. 24; 
col. 25, 11. 17-22, teaches "if it is determined. . .that an operation for terminating the processing 
for making new voice-generating information has not been executed, and at the same time that an 
operation for changing any parameter has not been executed, the processing. . .is repeatedly 
executed"; see also col. 25, 11. 53-60). 

Regarding claim 26. Yamazaki teaches wherein the parameter updates and segment 
dataset corrections are applied in regenerating the synthesized audio (see for example, Figs. 32, 
33 and 37, particularly elements 25E, El, Ell and El 2, which illustrate new synthesized 
waveforms corresponding to a velocity adjustment, as described in col. 25, 11. 18-39 and 61-67; 
see col. 25, 11. 40-52, "new filing"; col. 25, line 61 - col. 26, line 28, "change of 
velocity. . .change of pitch. . .for each label"' col. 26, 11. 29-53, "change of parameters 
in. . .original synthesized waveform. . .object for editing"; and col. 23, 11. 1-16, "making new 
voice-generating information"; the information is a set of data for the voice segment (segment 
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dataset), and is used to display the information in Fig. 33, and edits the information and the 
corresponding display). 

Regarding claims 2 and 14 . Yamazaki teaches displaying the parameters responsive to a 
user selection of at least a portion of the waveform, the displayed parameters correlating to the 
selected portion of the waveform (col. 26, 11. 29-47, "editing. . .basically the same 
processing. . .file as an object for editing is selected. . .treated as an original waveform," if a user 
selects a file then he/she selects the entire waveform, at least a portion of the waveform, which is 
put on the display with the pitch, velocity, etc.). 

Regarding claims 3 and 15 . Yamazaki teaches identifying a portion of the waveform 
responsive to a user selection of at least one of the parameters, the identified portion of the 
waveform correlating to the selected parameters (col. 25, 11. 18-39, "velocity 
adjustment. . .velocity El in a time zone for the phoneme of 'ka' . . .subdivided to velocity Ell"; 
col. 25, 11. 61-67, "value of pitch. . .for each label"; Figs. 3 1-33; e.g. choosing the velocity for one 
of the phonemes to be changed identifies the corresponding phoneme/portion so that that portion 
of the waveform can be changed). 

Regarding claims 8 and 20 , Yamazaki teaches wherein said adjusting step comprises at 
least one action selected from the group consisting of deleting a pitch mark, inserting a pitch 
mark, and repositioning a pitch mark by deleting a phonetic unit label, adding a phonetic unit 
label, modifying a phonetic unit label, and repositioning the phonetic unit boundaries (see col. 
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24, 11. 6-43, "deletion of a pitch label. . .adding a new label. . .movement"; and col. 25, line 63 - 
col. 26, line 8). 

Regarding claims 9 and 21 . Yamazaki teaches wherein [said displaying parameters step] 

further comprises the step of displaying a waveform from the original recording along with the 
phonetic unit (col. 23, 11. 8-16, "natural voice is inputted... original waveform is displayed"; Fig. 
33). 

Regarding claims 10 and 22 . Yamazaki fails to explicitly teach, but Miyatake teaches 
wherein edits to the waveform adjust parameters in the segment dataset (Figs. 3-4, col. 4, 11. 20- 
36, "displayed characters are edited by the inputting means. . .inputting means. . .separate. . .from 
each other. . .pauses are created between"; inserting the pauses edits the waveform and adjusts the 
starting and ending positions [parameters] of the phonemes). 

Therefore, it would have been obvious for one of ordinary skill in the art at the time the 

invention was made to modify the teaching elements of Yamazaki with Miyatake in order to 
synthesize a natural speech that is close to the way a human being speaks, as taught by Miyatake 
in col. 1,11. 20-30. 

Conclusion 

8. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Eunice Ng whose telephone number is 571-272-2854. The 
examiner can normally be reached on Monday through Friday, 8:30 a.m. - 5:00 p.m. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on 571-272-7843. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an appUcation may be obtained from the Patent 
Application Information Retrieval (PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status information for unpublished 
applications is available through Private PAIR only. For more information about the PAIR 
system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR 
system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would 
like assistance from a USPTO Customer Service Representative or access to the automated 
information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/E. N./ 

Examiner, Art Unit 2626 

/David R Hudspeth/ 
Supervisory Patent Examiner, Art Unit 2626 



